chore(ai): Add check-code-attribution skill (JAVA-499)#5449
Conversation
📲 Install BuildsAndroid
|
Adds a check-code-attribution skill that validates license headers + THIRD_PARTY_NOTICES.md entries for code copied or adapted from third parties. Also verifies license compatiblity against Sentry's licensing policy. Focus is limited to the branch diff. Reports any issues found via PR comments (when run on CI) or to the terminal (when run locally). To run it in Claude Code: ``` /check-code-attribution ``` Runs on CI automatically via [Warden](https://warden.sentry.dev/). - Purely advisory / does not block merge. - Generates PR comments with code suggestions for all discovered issues. - Automatically manages removing stale comments as PRs are updated. Current Warden configs: ┌─────────────────┬─────────────────────────────┬───────────────────────────────────────────────────┐ │ Setting │ Value │ Effect │ ├─────────────────┼─────────────────────────────┼───────────────────────────────────────────────────┤ │ model │ anthropic/claude-sonnet-4-6 │ Model used for analysis │ ├─────────────────┼─────────────────────────────┼───────────────────────────────────────────────────┤ │ maxTurns │ 30 │ Max tool calls per chunk │ ├─────────────────┼─────────────────────────────┼───────────────────────────────────────────────────┤ │ skill │ check-code-attribution │ Per-file vendored code attribution check │ ├─────────────────┼─────────────────────────────┼───────────────────────────────────────────────────┤ │ failOn │ off │ Do not fail workflow if attribution issues found │ ├─────────────────┼─────────────────────────────┼───────────────────────────────────────────────────┤ │ reportOn │ medium │ Show findings at >= medium severity via PR comment│ ├─────────────────┼─────────────────────────────┼───────────────────────────────────────────────────┤ │ requestChanges │ false │ Never post REQUEST_CHANGES comments on PRs │ ├─────────────────┼─────────────────────────────┼───────────────────────────────────────────────────┤ │ failCheck │ false │ No red X on workflow in GitHub UI if it fails │ ├─────────────────┼─────────────────────────────┼───────────────────────────────────────────────────┤ │ triggers │ pull_request + local │ Runs on PR open/sync and local warden invocations │ ├─────────────────┼─────────────────────────────┼───────────────────────────────────────────────────┤ │ reportOnSuccess │ false (default) │ No comment when everything is clean │ └─────────────────┴─────────────────────────────┴───────────────────────────────────────────────────┘ Going forward, we can consider blocking PRs once we've had a chance to vet behavior in the wild.
9ced7c7 to
93b92d7
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 93b92d7. Configure here.
runningcode
left a comment
There was a problem hiding this comment.
Thanks for the well thought out approach as well as all the tests. I think we don't have so much precedent for testing skills. Other repos use an LLM to judge that skills work correctly. I think that is a bit overkill here so I think your approach with the bash script is a good compromise. To be honest, I wasn't expecting any tests here.
I've approved this PR. You can consider all my comments to be nits and I'm happy to discuss them.
I wonder if we or someone has best practices on creating skills. In my head I imagined that skills should only describe "what" needs to be done and letting the LLM figure out the "how" rather than explaining the "how" in the skill. I've left some comments to that effect.
We actually have /skill-writer, which was helpful.
Agreed. The "how" explanations came about slowly over time, as I tested the initial SKILL.md file, found it wasn't giving the results needed, and asked Claude to update it. I didn't warn Claude to maintain particular standards w/r/t its updates. That was a mistake on my part. (Presumably I should've enabled /skill-writer before making those changes. Learning as I go...) |

📜 Description
Adds a
check-code-attributionskill that validates license headers andTHIRD_PARTY_NOTICES.mdentries for code copied or adapted from third parties. Also verifies license compatibility against Sentry's Licensing Policy.The skill focuses on the branch diff only. It's a pure-LLM approach, in contrast to the part-deterministic, part-LLM approach we decided against from #5401.
Reports findings via PR comments when run on CI, or to the terminal when run locally.
Local
To run it from Claude Code:
CI
Warden configuration ensures the skill runs automatically on all PRs:
💡 Motivation and Context
Third-party code attribution is a legal and compliance requirement. Currently, attribution correctness is only caught during manual code review. This skill automates detection of vendored code in branch diffs and can help us flag missing or incomplete attributions before a PR is merged.
Background: Click to expand
Sentry SDKs and third-party code
3 possible ways third-party code enters Sentry’s SDKs (including sentry-java):
1. Plain vanilla dependencies
2. Shaded code
3. Vendored code
All third-party code must be properly attributed, and licenses must be compatible with Sentry’s licensing policies.
Plain deps + shaded code: We run an
enforce-license-complianceGitHub workflow that applies a FOSSA check to all plain vanilla dependencies and our few shaded dependencies, which ensures their licenses are properly attributed and are compatible with Sentry’s licensing policies.Vendored code: Relies on a manual process where developers add attributions to files containing vendored code + include a corresponding entry is included in the THIRD_PARTY_NOTICES.md file that ships with the SDK. Developers are also responsible for ensuring license compatibility.
The criteria for what counts as a proper attribution of vendored code lives in the AGENTS.md file under the heading “Third-Party Code Attribution”.
Goal of this PR: Create a skill that helps us properly attribute vendored code
Types of vendored code:
The skill introduced in this PR protects (1) from regression and identifies instances of (2). (Addressing (3) is out of scope – and is obviously non-trivial.)
Skill does not mandate that license headers exactly match the template from
AGENTS.md(link) so long as all template fields are present.That^^ lets us maintain our current, diverse header formats and remain relatively unopinionated going forward.
Example output
Local runs
PR comments
💚 How did you test it?
[1] Automated validation tests (
check-code-attribution-tests.sh) with scenario files covering:Note: the tests are not run on CI / are only run manually atm (see the
check-code-attribution-tests.shscript).[2] Ran the skill on branches with known attribution issues to verify correct detection and reporting.
Manual tests + output: Click to expand
Note the skill's output format has changed since these tests were run, but the behavior remains the same.
Diff 1: Remove entire license header
Output 1
Required attribution field(s) removed:
Diff 2: Modify existing license header, but retain all required fields
Output 2
Vendored code detected (Apache Commons Collections) – verify that
THIRD_PARTY_NOTICES.mdreflects your updates.Diff 3: Modify existing license header by removing one or more required fields
Output 3
Required attribution field(s) removed:
Copyright (C) 2016 Matej Tymeswas removed from the license header. Please restore it.Diff 4: Leave existing license header unchanged, but make an inconsistent modification to THIRD_PARTY_NOTICES.md entry
Output 4
Entry metadata inconsistent with source file headers:
THIRD_PARTY_NOTICES.md, but the source files (QueueFile.java,FileObjectQueue.java,ObjectQueue.java) all still say "Copyright (C) 2010Square, Inc."
Diff 5: Leave existing license header unchanged, but remove THIRD_PARTY_NOTICES.md entry
Output 5
Source file(s) still reference this library:
-
io.sentry.android.core.SentryShakeDetectorstill contains attribution header for Square's Seismic. Either restore theTHIRD_PARTY_NOTICES.mdentry or remove the vendored code.Diff 6: Add newly-vendored code with valid license header and THIRD_PARTY_NOTICES.md entry
Output 6
Vendored code detected (Dropwizard Metrics SlidingWindowReservoir) – verify that
THIRD_PARTY_NOTICES.mdreflects your updates.Diff 7: Add newly-vendored code with valid license header but no THIRD_PARTY_NOTICES.md entry
Output 7
Vendored code detected (Caffeine Cache) — attribution header is complete.
THIRD_PARTY_NOTICES.md. An entry needs to be added.Diff 8: Add newly-vendored code with an invalid license header and existing THIRD_PARTY_NOTICES.md entry
Output 8
Vendored code detected (Resilience4j RateLimiter) — missing required fields:
Copyright 2019 Robert Winkler and Bohdan StorozhukandLicensed under the Apache License, Version 2.0(per the existing
THIRD_PARTY_NOTICES.mdentry).Diff 9: Add newly-vendored code with an invalid license header and no THIRD_PARTY_NOTICES.md entry
Output 9
Vendored code detected (Guava RateLimiter) — missing required fields:
THIRD_PARTY_NOTICES.md. An entry needs to be added for Guava RateLimiter.Diff 10: Add newly-vendored code with an invalid license header, no THIRD_PARTY_NOTICES.md entry, and a new license type
Output 10
Vendored code detected (compact-writer) — missing required fields:
THIRD_PARTY_NOTICES.md. An entry needs to be added.THIRD_PARTY_NOTICES.md. Please verify it is compatible with Sentry's licensing policies:https://open.sentry.io/licensing/.
Diff 11: False positive
Output 11 (numbered as “1.” because it’s the first entry in the False Positives section)
[3] Local Warden runs.
Running Warden locally: Click to expand
Added to my
.*profile:[4] Pushed up diffs with attribution violations in a draft PR to vet the UX (see #5444).
📝 Checklist
sendDefaultPIIis enabled.🔮 Next steps
failOn/failCheckonce we've vetted behavior in the wild#skip-changelog